Optimized Transform Coding for Approximate KNN Search
نویسندگان
چکیده
Transform coding (TC) is an efficient and effective vector quantization approach where the resulting compact representation can be the basis for a more elaborate hierarchical framework for sub-linear approximate search. However, as compared to the state-of-the-art product quantization methods, there is a significant performance gap in terms of matching accuracy. One of the main shortcomings of TC is that the solution for bit allocation relies on an assumption that probability density of each component of the vector can be made identical after normalization. Motivated by this, we propose an optimized transform coding (OTC) such that bit allocation is optimized directly on the binned kernel estimator of each component of the vector. Experiments on public datasets show that our optimized transform coding approach achieves performance comparable to the state-ofthe-art product quantization methods, while maintaining learning speed comparable to TC.
منابع مشابه
EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...
متن کاملOptimisation of correlation matrix memory prognostic and diagnostic systems
Condition monitoring systems for prognostics and diagnostics can enable large and complex systems to be operated more safely, at a lower cost and have a longer lifetime than is possible without them. AURA Alert is a condition monitoring system that uses a fast approximate k Nearest Neighbour (kNN) search of a timeseries database containing known system states to identify anomalous system behavi...
متن کاملEfficient k-nearest neighbor searches for multi-source forest attribute mapping
In this study, we explore the utility of data structures that facilitate efficient nearest neighbor searches for application in multi-source forest attribute prediction. Our trials suggest that the kd-tree in combination with exact search algorithms can greatly reduce nearest neighbor search time. Further, given our trial data, we found that enormous gain in search time efficiency, afforded by ...
متن کاملEvolutionary Nearest Neighbour Classification Framework
Data classification attempts to assign a category or a class label to an unknown data object based on an available similar data set with class labels already assigned. K nearest neighbor (KNN) is a widely used classification technique in data mining. KNN assigns the majority class label of its closest neighbours to an unknown object, when classifying an unknown object. The computational efficie...
متن کاملEfficient and Effective KNN Sequence Search with Approximate n-grams
In this paper, we address the problem of finding k-nearest neighbors (KNN) in sequence databases using the edit distance. Unlike most existing works using short and exact ngram matchings together with a filter-and-refine framework for KNN sequence search, our new approach allows us to use longer but approximate n-gram matchings as a basis of KNN candidates pruning. Based on this new idea, we de...
متن کامل